110 research outputs found
Differential meta-analysis of RNA-seq data from multiple studies
High-throughput sequencing is now regularly used for studies of the
transcriptome (RNA-seq), particularly for comparisons among experimental
conditions. For the time being, a limited number of biological replicates are
typically considered in such experiments, leading to low detection power for
differential expression. As their cost continues to decrease, it is likely that
additional follow-up studies will be conducted to re-address the same
biological question. We demonstrate how p-value combination techniques
previously used for microarray meta-analyses can be used for the differential
analysis of RNA-seq data from multiple related studies. These techniques are
compared to a negative binomial generalized linear model (GLM) including a
fixed study effect on simulated data and real data on human melanoma cell
lines. The GLM with fixed study effect performed well for low inter-study
variation and small numbers of studies, but was outperformed by the
meta-analysis methods for moderate to large inter-study variability and larger
numbers of studies. To conclude, the p-value combination techniques illustrated
here are a valuable tool to perform differential meta-analyses of RNA-seq data
by appropriately accounting for biological and technical variability within
studies as well as additional study-specific effects. An R package metaRNASeq
is available on the R Forge
Use of the score test as a goodness-of-fit measure of the covariance structure in genetic analysis of longitudinal data
Model selection is an essential issue in longitudinal data analysis since many different models have been proposed to fit the covariance structure. The likelihood criterion is commonly used and allows to compare the fit of alternative models. Its value does not reflect, however, the potential improvement that can still be reached in fitting the data unless a reference model with the actual covariance structure is available. The score test approach does not require the knowledge of a reference model, and the score statistic has a meaningful interpretation in itself as a goodness-of-fit measure. The aim of this paper was to show how the score statistic may be separated into the genetic and environmental parts, which is difficult with the likelihood criterion, and how it can be used to check parametric assumptions made on variance and correlation parameters. Selection of models for genetic analysis was applied to a dairy cattle example for milk production
Estimation of genetic parameters for test day records of dairy traits in the first three lactations
Application of test-day models for the genetic evaluation of dairy populations requires the solution of large mixed model equations. The size of the (co)variance matrices required with such models can be reduced through the use of its first eigenvectors. Here, the first two eigenvectors of (co)variance matrices estimated for dairy traits in first lactation were used as covariables to jointly estimate genetic parameters of the first three lactations. These eigenvectors appear to be similar across traits and have a biological interpretation, one being related to the level of production and the other to persistency. Furthermore, they explain more than 95% of the total genetic variation. Variances and heritabilities obtained with this model were consistent with previous studies. High correlations were found among production levels in different lactations. Persistency measures were less correlated. Genetic correlations between second and third lactations were close to one, indicating that these can be considered as the same trait. Genetic correlations within lactation were high except between extreme parts of the lactation. This study shows that the use of eigenvectors can reduce the rank of (co)variance matrices for the test-day model and can provide consistent genetic parameters
EM-REML estimation of covariance parameters in Gaussian mixed models for longitudinal data analysis
This paper presents procedures for implementing the EM algorithm to compute REML estimates of variance covariance components in Gaussian mixed models for longitudinal data analysis. The class of models considered includes random coefficient factors, stationary time processes and measurement errors. The EM algorithm allows separation of the computations pertaining to parameters involved in the random coefficient factors from those pertaining to the time processes and errors. The procedures are illustrated with Pothoff and Roy's data example on growth measurements taken on 11 girls and 16 boys at four ages. Several variants and extensions are discussed
A quasi-score approach to the analysis of ordered categorical data via a mixed heteroskedastic threshold model
This article presents an extension of the methodology developed by Gilmour et al. [19], for ordered categorical data, taking into account the heterogeneity of residual variances of latent variables. Heterogeneity of residual variances is described via a structural linear model on log-variances. This method involves two main steps: i) a ’marginalization’ with respect to the random effects leading to quasi-score estimators; ii) an approximation of the variance-covariance matrix of the observations which leads to an analogue of the Henderson mixed model equations for continuous Gaussian data. This methodology is illustrated by a numerical example of footshape in sheep.Cet article présente une extension de la méthodologie développée par Gilmour et al. [19] dans le cas de variables qualitatives ordonnées, prenant en compte l’hétérogénéité des variances résiduelles des variables latentes. L’hétérogénéité des variances résiduelles est décrite par un modèle linéaire structurel sur les logarithmes des variances. Cette méthode comprend deux étapes principales : i) une « marginalisation » par rapport aux effets aléatoires qui conduit, grâce aux équations de quasi-score, à l’estimation des paramètres ; ii) une approximation de la matrice de variance-covariance des observations qui aboutit à un système analogue aux équations du modèle mixte d’Henderson dans le cas de variables continues gaussiennnes. Cette méthodologie est illustrée par un exemple sur la forme des pieds chez le mouton
Genetic analysis of growth curves using the SAEM algorithm
The analysis of nonlinear function-valued characters is very important in genetic studies, especially for growth traits of agricultural and laboratory species. Inference in nonlinear mixed effects models is, however, quite complex and is usually based on likelihood approximations or Bayesian methods. The aim of this paper was to present an efficient stochastic EM procedure, namely the SAEM algorithm, which is much faster to converge than the classical Monte Carlo EM algorithm and Bayesian estimation procedures, does not require specification of prior distributions and is quite robust to the choice of starting values. The key idea is to recycle the simulated values from one iteration to the next in the EM algorithm, which considerably accelerates the convergence. A simulation study is presented which confirms the advantages of this estimation procedure in the case of a genetic analysis. The SAEM algorithm was applied to real data sets on growth measurements in beef cattle and in chickens. The proposed estimation procedure, as the classical Monte Carlo EM algorithm, provides significance tests on the parameters and likelihood based model comparison criteria to compare the nonlinear models with other longitudinal methods
Detection and modelling of time-dependent QTL in animal populations
A longitudinal approach is proposed to map QTL affecting function-valued traits and to estimate their effect over time. The method is based on fitting mixed random regression models. The QTL allelic effects are modelled with random coefficient parametric curves and using a gametic relationship matrix. A simulation study was conducted in order to assess the ability of the approach to fit different patterns of QTL over time. It was found that this longitudinal approach was able to adequately fit the simulated variance functions and considerably improved the power of detection of time-varying QTL effects compared to the traditional univariate model. This was confirmed by an analysis of protein yield data in dairy cattle, where the model was able to detect QTL with high effect either at the beginning or the end of the lactation, that were not detected with a simple 305 day model
Reverse engineering gene regulatory networks using approximate Bayesian computation
Gene regulatory networks are collections of genes that interact with one
other and with other substances in the cell. By measuring gene expression over
time using high-throughput technologies, it may be possible to reverse
engineer, or infer, the structure of the gene network involved in a particular
cellular process. These gene expression data typically have a high
dimensionality and a limited number of biological replicates and time points.
Due to these issues and the complexity of biological systems, the problem of
reverse engineering networks from gene expression data demands a specialized
suite of statistical tools and methodologies. We propose a non-standard
adaptation of a simulation-based approach known as Approximate Bayesian
Computing based on Markov chain Monte Carlo sampling. This approach is
particularly well suited for the inference of gene regulatory networks from
longitudinal data. The performance of this approach is investigated via
simulations and using longitudinal expression data from a genetic repair system
in Escherichia coli.Comment: 16 pages, 11 figure
- …